Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DarSwin: Distortion Aware Radial Swin Transformer (2304.09691v5)

Published 19 Apr 2023 in cs.CV

Abstract: Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions, making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. Our proposed image encoder architecture, dubbed DarSwin, leverages the physical characteristics of such lenses analytically defined by the radial distortion profile. In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and an angular position encoding for radial patch merging. Compared to other baselines, DarSwin achieves the best results on different datasets with significant gains when trained on bounded levels of distortions (very low, low, medium, and high) and tested on all, including out-of-distribution distortions. While the base DarSwin architecture requires knowledge of the radial distortion profile, we show it can be combined with a self-calibration network that estimates such a profile from the input image itself, resulting in a completely uncalibrated pipeline. Finally, we also present DarSwin-Unet, which extends DarSwin, to an encoder-decoder architecture suitable for pixel-level tasks. We demonstrate its performance on depth estimation and show through extensive experiments that DarSwin-Unet can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. The code and models are publicly available at https://lvsn.github.io/darswin/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Kim, H., Chae, E., Jo, G., Paik, J.: Fisheye lens-based surveillance camera for wide field-of-view monitoring. In: IEEE Int. Conf. Cons. Elec. (2015) Schmalstieg and Höllerer [2017] Schmalstieg, D., Höllerer, T.: Augmented reality: Principles and practice. In: IEEE Virt. Reality (2017) Deng et al. [2019] Deng, L., Yang, M., Li, H., Li, T., Bing, h., Wang, C.: Restricted deformable convolution-based road scene semantic segmentation using surround view cameras. IEEE Trans. Int. Trans. Syst. (2019) Yogamani et al. [2019] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Schmalstieg, D., Höllerer, T.: Augmented reality: Principles and practice. In: IEEE Virt. Reality (2017) Deng et al. [2019] Deng, L., Yang, M., Li, H., Li, T., Bing, h., Wang, C.: Restricted deformable convolution-based road scene semantic segmentation using surround view cameras. IEEE Trans. Int. Trans. Syst. (2019) Yogamani et al. [2019] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Deng, L., Yang, M., Li, H., Li, T., Bing, h., Wang, C.: Restricted deformable convolution-based road scene semantic segmentation using surround view cameras. IEEE Trans. Int. Trans. Syst. (2019) Yogamani et al. [2019] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  2. Schmalstieg, D., Höllerer, T.: Augmented reality: Principles and practice. In: IEEE Virt. Reality (2017) Deng et al. [2019] Deng, L., Yang, M., Li, H., Li, T., Bing, h., Wang, C.: Restricted deformable convolution-based road scene semantic segmentation using surround view cameras. IEEE Trans. Int. Trans. Syst. (2019) Yogamani et al. [2019] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Deng, L., Yang, M., Li, H., Li, T., Bing, h., Wang, C.: Restricted deformable convolution-based road scene semantic segmentation using surround view cameras. IEEE Trans. Int. Trans. Syst. (2019) Yogamani et al. [2019] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  3. Deng, L., Yang, M., Li, H., Li, T., Bing, h., Wang, C.: Restricted deformable convolution-based road scene semantic segmentation using surround view cameras. IEEE Trans. Int. Trans. Syst. (2019) Yogamani et al. [2019] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  4. Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Chennupati, S., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Nayak, S., Mansoor, S., Varley, P., Perrotton, X., Odea, D., Pérez, P.: Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In: Int. Conf. Comput. Vis. (2019) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  5. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Int. Conf. Comput. Vis. (2021) Huang et al. [2018] Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  6. Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Eur. Conf. Comput. Vis. (2018) Torralba and Efros [2011] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  7. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: IEEE Conf. Comput. Vis. Pattern Recog. (2011) Khosla et al. [2012] Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  8. Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Eur. Conf. Comput. Vis. (2012) Brousseau and Roy [2019] Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  9. Brousseau, P.-A., Roy, S.: Calibration of axial fisheye cameras through generic virtual central models. In: Int. Conf. Comput. Vis. (2019) Ramalingam and Sturm [2017] Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  10. Ramalingam, S., Sturm, P.: A unifying model for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1309–1319 (2017) Zhang et al. [2015] Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  11. Zhang, M., Yao, J., Xia, M., Li, K., Zhang, Y., Liu, Y.: Line-based multi-label energy optimization for fisheye image rectification and calibration. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Melo et al. [2013] Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  12. Melo, R., Antunes, M., Barreto, J.P., Falcão, G., Gonçalves, N.: Unsupervised intrinsic calibration from a single frame using a “plumb-line“ approach. In: Int. Conf. Comput. Vis. (2013) Kannala and Brandt [2006] Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kannala, J., Brandt, S.S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 1335–1340 (2006) Yin et al. [2018] Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  13. Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: Fisheyerecnet: A multi-context collaborative deep network for fisheye image rectification. In: Eur. Conf. Comput. Vis. (2018) Xue et al. [2019] Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  14. Xue, Z., Xue, N., Xia, G., Shen, W.: Learning to calibrate straight lines for fisheye image rectification. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Plaut et al. [2021] Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  15. Plaut, E., Ben Yaacov, E., El Shlomo, B.: 3d object detection from a single fisheye image without a single fisheye training image. In: IEEE Conf. Comput. Vis. Pattern Recog. Worksh. (2021) Playout et al. [2021] Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  16. Playout, C., Ahmad, O., Lécué, F., Cheriet, F.: Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems. CoRR abs/2102.10191 (2021) Ahmad and Lecue [2022] Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  17. Ahmad, O., Lecue, F.: FisheyeHDK: Hyperbolic deformable kernel learning for ultra-wide field-of-view image recognition. In: AAAI (2022) Dai et al. [2017] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  18. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Int. Conf. Comput. Vis. (2017) Zhu et al. [2019] Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  19. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: More deformable, better results. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Cohen et al. [2018] Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  20. Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical cnns. In: Int. Conf. Learn. Represent. (2018) Cohen et al. [2019] Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  21. Cohen, T.S., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Int. Conf. on Mach. Learning (2019) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  22. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: Int. Conf. Learn. Represent. (2020) Zhou et al. [2021] Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  23. Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Hou, Q., Feng, J.: Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886 (2021) Xia et al. [2022] Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  24. Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Athwale et al. [2023] Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  25. Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.-F.: Darswin: Distortion aware radial swin transformer. In: Int. Conf. Comput. Vis. (2023) Cao et al. [2021] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  26. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation (2021) Xiong et al. [2019] Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  27. Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: UPSNet: A Unified Panoptic Segmentation Network (2019) Zioulis et al. [2018] Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  28. Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: Omnidepth: Dense depth estimation for indoors spherical panoramas. In: Eur. Conf. Comput. Vis. (2018) Yun et al. [2022] Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  29. Yun, H., Lee, S., Kim, G.: Panoramic vision transformer for saliency detection in 360⁢°360°360 ⁢ ° videos. In: Eur. Conf. Comput. Vis. (2022) Zhang et al. [2022] Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  30. Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In: IEEE Conf. Comput. Vis. Pattern Recog. (2022) Fernandez-Labrador et al. [2018] Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  31. Fernandez-Labrador, C., Perez-Yus, A., Lopez-Nicolas, G., Guerrero, J.J.: Layouts from panoramic images with geometry and deep learning. IEEE Rob. Autom. Letters 3(4), 3153–3160 (2018) Su and Grauman [2019] Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  32. Su, Y.-C., Grauman, K.: Kernel transformer networks for compact spherical convolution. In: IEEE Conf. Comput. Vis. Pattern Recog. (2019) Rashed et al. [2021] Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  33. Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., Yogamani, S.: Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline. In: Wint. Conf. Appl. Comput. Vis. (2021) Rashed et al. [2020] Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  34. Rashed, H., Mohamed, E., Sistu, G., Ravi Kumar, V., Eising, C., Sallab, A., Yogamani, S.: Fisheyeyolo: Object detection on fisheye cameras for autonomous driving. In: Adv. Neural Inform. Process. Syst. (2020) Ye et al. [2020] Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  35. Ye, Y., Yang, K., Xiang, K., Wang, J., Wang, K.: Universal semantic segmentation for fisheye urban driving images. IEEE Int. Conf. Syst. Man Cyber., 648–655 (2020) Kumar et al. [2021] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  36. Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robot. Autom. Letters 6(2), 2830–2837 (2021) Kumar [2022] Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  37. Kumar, V.R.: Surround-view cameras based holistic visual perception for automated driving. IEEE Transa. on Int. Transport. Sys. (2022) Liao et al. [2022] Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  38. Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Trans. Pattern Anal. Mach. Intell. (2022) Devernay and Faugeras [2001] Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  39. Devernay, F., Faugeras, O.: Straight lines have to be straight automatic calibration and removal of distortion from scenes of structured environments. Mach. Vis. Appl. 13 (2001) Kim et al. [2022] Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  40. Kim, B., Lee, D., Min, K., Chong, J., Joe, I.: Global convolutional neural networks with self-attention for fisheye image rectification. IEEE Access 10, 129580–129587 (2022) https://doi.org/10.1109/ACCESS.2022.3228297 Liao et al. [2020] Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  41. Liao, K., Lin, C., Zhao, Y., Xu, M.: Model-free distortion rectification framework bridged by distortion distribution map. IEEE Trans. Image Process. 29, 3707–3718 (2020) Liao et al. [2021] Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  42. Liao, K., Lin, C., Zhao, Y.: A deep ordinal distortion estimation approach for distortion rectification. IEEE Trans. Image Process. 30, 3362–3375 (2021) Yang et al. [2021] Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  43. Yang, S., Lin, C., Liao, K., Zhang, C., Zhao, Y.: Progressively complementary network for fisheye image rectification using appearance flow. In: IEEE Conf. Comput. Vis. Pattern Recog. (2021) Feng et al. [2023] Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  44. Feng, H., Wang, W., Deng, J., Zhou, W., Li, L., Li, H.: Simfir: A simple framework for fisheye image rectification with self-supervised representation learning. In: Int. Conf. Comput. Vis. (2023) Yang et al. [2023] Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  45. Yang, S., Lin, C., Liao, K., Zhao, Y.: Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization (2023) Yang et al. [2022] Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  46. Yang, S., Lin, C., Liao, K., Zhao, Y.: Fishformer: Annulus slicing-based transformer for fisheye rectification with efficacy domain exploration. arXiv preprint arXiv:2207.01925 (2022) Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  47. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Adv. Neural Inform. Process. Syst. (2012) Szegedy et al. [2015] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  48. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Feature pyramid networks for object detection. In: IEEE Conf. Comput. Vis. Pattern Recog. (2015) Liu and Deng [2015] Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  49. Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (2015) Bronstein et al. [2021] Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  50. Bronstein, M.M., Bruna, J., Cohen, T., Velickovic, P.: Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. CoRR abs/2104.13478 (2021) 2104.13478 Sáez et al. [2018] Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  51. Sáez, ., Bergasa, L.M., Romeral, E., López, E., Barea, R., Sanz, R.: Cnn-based fisheye image real-time semantic segmentation. In: 2018 IEEE Intelligent Vehicles Symposium (IV) (2018) Yan et al. [2021] Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  52. Yan, Q., Ji, P., Bansal, N., Ma, Y., Tian, Y., Xu, Y.: FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras (2021) Kumar et al. [2021a] Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  53. Kumar, V.R., Yogamani, S., Rashed, H., Sitsu, G., Witt, C., Leang, I., Milz, S., Mäder, P.: Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving. IEEE Robotics and Automation Letters 6(2), 2830–2837 (2021) Kumar et al. [2021b] Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  54. Kumar, V.R., Klingner, M., Yogamani, S., Milz, S., Fingscheidt, T., Mader, P.: Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving. In: Wint. Conf. Appl. Comput. Vis. (2021) Kumar et al. [2021c] Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  55. Kumar, V.R., Klingner, M., Yogamani, S., Bach, M., Milz, S., Fingscheidt, T., Mäder, P.: Svdistnet: Self-supervised near-field distance estimation on surround view fisheye cameras. IEEE Transactions on Intelligent Transportation Systems 23(8), 10252–10261 (2021) Wei et al. [2023] Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  56. Wei, X., Ran, Z., Lu, X.: Dcpb: Deformable convolution based on the poincare ball for top-view fisheye cameras. In: Int. Conf. Comput. Vis. (2023) Vaswani et al. [2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  57. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Adv. Neural Inform. Process. Syst. (2017) Zhang et al. [2022] Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  58. Zhang, J., Yang, K., Shi, H., Reiß, S., Peng, K., Ma, C., Fu, H., Torr, P.H.S., Wang, K., Stiefelhagen, R.: Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation (2022) Lee et al. [2023] Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  59. Lee, J., Cho, G., Park, J., Kim, K., Lee, S., Kim, J.-H., Jeong, S.-G., Joo, K.: Slabins: Fisheye depth estimation using slanted bins on road environments. In: Int. Conf. Comput. Vis. (2023) Shi et al. [2023] Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  60. Shi, H., Li, Y., Yang, K., Zhang, J., Peng, K., Roitberg, A., Ye, Y., Ni, H., Wang, K., Stiefelhagen, R.: Fishdreamer: Towards fisheye semantic completion via unified image outpainting and segmentation (2023) Kumar et al. [2020] Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  61. Kumar, V.R., Hiremath, S.A., Milz, S., Witt, C., Pinnard, C., Yogamani, S., Mader, P.: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving (2020) Kumar et al. [2023] Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  62. Kumar, V.R., Yogamani, S., Bach, M., Witt, C., Milz, S., Mader, P.: UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models (2023) Beck [1925] Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  63. Beck, C.: Apparatus to photograph the whole sky. J. Scientific Inst. 2(4), 135–139 (1925) Hill [1924] Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  64. Hill, R.: A lens for whole sky photographs. Quart. J. Royal Meteo. Soc. 50(211), 227–235 (1924) Fleck [1995] Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  65. Fleck, M.M.: Perspective projection: The wrong imaging model. IEEE Trans. Reliability (1995) Miyamoto [1964] Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  66. Miyamoto, K.: Fish eye lens. J. Opt. Soc. Am. 54(8), 1060–1061 (1964) https://doi.org/10.1364/JOSA.54.001060 Hughes et al. [2010] Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  67. Hughes, C., Denny, P., Jones, E., Glavin, M.: Accuracy of fish-eye lens models. Applied optics 49(17), 3338–3347 (2010) Barreto [2006] Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  68. Barreto, J.P.: A unifying geometric representation for central projection systems. Comput. Vis. Img. Underst. (2006) Mei and Rives [2007] Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  69. Mei, C., Rives, P.: Single view point omnidirectional camera calibration from planar grids. In: Int. Conf. Robot. Aut. (2007) Ying and Hu [2004] Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  70. Ying, X., Hu, Z.: Can we consider central catadioptric cameras and fisheye cameras within a unified imaging model. In: Eur. Conf. Comput. Vis. (2004) Hold-Geoffroy et al. [2023] Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  71. Hold-Geoffroy, Y., Piché-Meunier, D., Sunkavalli, K., Bazin, J.-C., Rameau, F., Lalonde, J.-F.: A perceptual measure for deep single image camera and lens calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(9) (2023) Liu et al. [2022] Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  72. Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022) Ning et al. [2020] Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  73. Ning, K., Xie, L., Wu, F., Tian, Q.: Polar relative positional encoding for video-language segmentation. In: Int. Joint Conf. Art. Intel. (2020) Chang et al. [2017] Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  74. Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV) (2017) Xiao et al. [2018] Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  75. Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Eur. Conf. Comput. Vis. (2018) Eigen et al. [2014] Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  76. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inform. Process. Syst. (2014) Shu et al. [2022] Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022) Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
  77. Shu, C., Chen, Z., Chen, L., Ma, K., Wang, M., Ren, H.: Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv preprint arXiv:2204.13892 (2022)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com